gpt-oss-20b
25.5% Overall Accuracy
Answer Key: claude-opus-4-5-20251101
Boundary Models: 19
Pairs: 171
Total Rollouts: 855
Max Turns: 5
Pairwise Accuracy Matrix
Conversation Explorer
Pair Accuracy: --
Conversation 1 of 0
💬

Select a model pair to view conversations